On cherche à étudier l’effet de trois facteurs sur le transcriptome des racines d’Arabidopsis. On cherche ici à utiliser et comparer différents packages de multiDE sur nos données.

Import des données

              R48 R39 R38 R28 R29 R30 R25 R31 R27 R33 R32 R26 R36 R37 R35 R34
Sly00g0382731   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
Sly00g0382741   1   1   0   0   0   0   0   0   0   0   0   0   1   0   0   0
Sly00g0382751 400 320 259 280 464 370 228 321 282 348 261 302 309 306 264 238
              R47 R46 R44 R45 R41 R40 R42 R43
Sly00g0382731   2   0   0   0   0   0   0   0
Sly00g0382741   0   0   0   0   0   0   0   0
Sly00g0382751 312 269 300 223 552 342 401 216
 [ reached 'max' / getOption("max.print") -- omitted 3 rows ]
[1] 39110    24
[1] "CnF_3"
[1] Sl_AmbientCO2_LowNitrate_FeStarvation1
48 Levels: At_AmbientCO2_HighNitrate_Fe1 ... Sl_ElevatedCO2_LowNitrate_FeStarvation3
[1] "At_AmbientCO2_LowNitrate_Fe"

Analyse globale avec edgeR

TCC fait le test globalement, avec un layout “one way”, qui prend les 8 conditions comme toutes différentes. En spécifiant un design de combinatoire \(2*2*2\), on peut avoir des coefficients testés contre 0 pour chacune des variables et leur interactions.

glmLRT conducts likelihood ratio tests for one or more coefficients in the linear model. If coef is used, the null hypothesis is that all the coefficients indicated by coef are equal to zero. If contrast is non-null, then the null hypothesis is that the specified contrasts of the coefficients are equal to zero. For example, a contrast of c(0,1,-1), assuming there are three coefficients, would test the hypothesis that the second and third coefficients are equal.

While the likelihood ratio test is a more obvious choice for inferences with GLMs, the QLF-test is preferred as it reflects the uncertainty in estimating the dispersion for each gene. Itprovides more robust and reliable error rate control when the number of replicates is small.The QL dispersion estimation and hypothesis testing can be done by using the functionsglmQLFit()andglmQLFTest()

      group lib.size norm.factors
Cnf_3   Cnf 26741173            1
cNf_3   cNf 21291368            1
cNf_2   cNf 23788658            1
cnF_1   cnF 22016602            1
cnF_2   cnF 29965632            1
cnF_3   cnF 31788665            1
cNF_1   cNF 20074554            1
CNF_1   CNF 27503057            1
cNF_3   cNF 23782396            1
CNF_3   CNF 29424187            1
CNF_2   CNF 26687502            1
cNF_2   cNF 23004843            1
CnF_3   CnF 30395731            1
cNf_1   cNf 25108644            1
CnF_2   CnF 20428037            1
CnF_1   CnF 21068589            1
Cnf_2   Cnf 24932215            1
Cnf_1   Cnf 28581275            1
CNf_2   CNf 25763959            1
CNf_3   CNf 23693250            1
cnf_2   cnf 35031098            1
cnf_1   cnf 25436178            1
cnf_3   cnf 27074819            1
CNf_1   CNf 24584667            1
Design matrix not provided. Switch to the classic mode.

Clustering sur les gènes DE

On utilise des modèles de mélange pour regrouper les gènes ayant des profils d’expression similaires dans les différentes conditions.

Sly02g0349581 Sly05g0137921 Sly04g0251231 Sly06g0375851 Sly08g0273531 
        66563         13206         57922         56933          6862 
Sly08g0282161 Sly10g0190401 Sly12g0071731 Sly11g0288951 Sly05g0155731 
        13240         60057         63640         72422         23644 
Sly04g0251211 Sly12g0056631 Sly03g0209731 Sly01g0030961 Sly03g0228011 
        36456         65188         63184        106750         15113 
Sly06g0370891 Sly02g0341281 Sly05g0142161 Sly10g0184191 Sly02g0324461 
        43186         27735         28696          9770          8643 
Sly05g0157311 Sly06g0371401 Sly01g0000401 Sly03g0209751 Sly08g0262341 
       354599         62243          6741         50002         15671 
Sly06g0352901 Sly07g0126971 Sly06g0353681 Sly12g0053661 Sly03g0222691 
         7532        109995         34317         60176         12172 
Sly08g0263181 Sly04g0251201 Sly12g0070721 Sly05g0156491 Sly01g0035861 
        28078         77220        179473         20904        123637 
Sly02g0349231 Sly03g0223341 Sly01g0022341 Sly12g0067311 Sly04g0247491 
        15702         90410         48343          9501        161474 
Sly09g0095061 Sly11g0294281 Sly05g0145111 Sly01g0039461 Sly09g0099711 
         6223         18517         18702         26841         36932 
Sly05g0143011 Sly06g0370181 Sly01g0003001 Sly05g0138461 Sly11g0307721 
        20587        177001         27616         36703          7975 
Sly10g0183051 Sly03g0209741 Sly08g0283221 Sly06g0372181 Sly06g0355821 
        17552         67718          6226          9483           411 
Sly06g0366131 Sly07g0119791 Sly02g0328831 Sly10g0184161 Sly09g0084621 
        89215         18801         83575          6174         70287 
Sly08g0264571 Sly00g0386741 Sly02g0342271 Sly06g0364641 Sly06g0377981 
        22218         56904         20350          5730          3516 
Sly11g0295831 Sly02g0346201 Sly06g0367601 Sly07g0110001 Sly01g0021751 
       231839        108176         65653        192025         17463 
Sly05g0148761 Sly06g0371951 Sly06g0363891 Sly01g0005461 Sly08g0264231 
       159760         49045         19392          9767          1604 
 [ reached getOption("max.print") -- omitted 748 entries ]
****************************************
coseq analysis: Poisson approach & none transformation
K = 8 to 12 
Use set.seed() prior to running coseq for reproducible results.
****************************************
Running g = 8 ...
[1] "Initialization: 1"
[1] "Log-like diff: 200.733285161674"
[1] "Log-like diff: 34.0168501515287"
[1] "Log-like diff: 4.67799934004114"
[1] "Log-like diff: 4.18226338077555"
[1] "Log-like diff: 0.014664002213415"
Running g = 9 ...
[1] "Initialization: 1"
[1] "Log-like diff: 0.00018384684975814"
[1] "Log-like diff: 2.05128579917613e-05"
[1] "Log-like diff: 2.28306505434261e-06"
Running g = 10 ...
[1] "Initialization: 1"
[1] "Log-like diff: 1.23397612128429e-08"
Running g = 11 ...
[1] "Initialization: 1"
[1] "Log-like diff: 3.46912272591684"
[1] "Log-like diff: 0.895422499903454"
[1] "Log-like diff: 0.137634506809182"
[1] "Log-like diff: 0.0176713797276591"
[1] "Log-like diff: 0.00353622179219926"
Running g = 12 ...
[1] "Initialization: 1"
[1] "Log-like diff: 0.120321852380055"
[1] "Log-like diff: 0.0241292474933132"
[1] "Log-like diff: 0.00128440542239971"
[1] "Log-like diff: 0.000177024889818966"
[1] "Log-like diff: 2.41107373533112e-05"
$logLike


$ICL


$profiles


$boxplots


$probapost_boxplots


$probapost_barplots


$probapost_histogram

*************************************************
Model: Poisson
Transformation: none
*************************************************
Clusters fit: 8,9,10,11,12
Clusters with errors: ---
Selected number of clusters via ICL: 12
ICL of selected model: -1703180
*************************************************
Number of clusters = 12
ICL = -1703180
*************************************************
Cluster sizes:
 Cluster 1  Cluster 2  Cluster 3  Cluster 4  Cluster 5  Cluster 6  Cluster 7 
        34         90        125         80         85         94         83 
 Cluster 8  Cluster 9 Cluster 10 Cluster 11 Cluster 12 
        29         14         33         59         97 

Number of observations with MAP > 0.90 (% of total):
816 (99.15%)

Number of observations with MAP > 0.90 per cluster (% of total per cluster):
 Cluster 1 Cluster 2 Cluster 3 Cluster 4 Cluster 5 Cluster 6 Cluster 7
 34        90        123       79        85        94        82       
 (100%)    (100%)    (98.4%)   (98.75%)  (100%)    (100%)    (98.8%)  
 Cluster 8 Cluster 9 Cluster 10 Cluster 11 Cluster 12
 28        14        33         58         96        
 (96.55%)  (100%)    (100%)     (98.31%)   (98.97%)  

ACP

On représente le clustering dans la plan principal d’une ACP

Class: pca dudi
Call: dudi.pca(df = log(dataC + 0.1), center = TRUE, scale = TRUE, 
    scannf = FALSE, nf = 4)

Total inertia: 24

Eigenvalues:
    Ax1     Ax2     Ax3     Ax4     Ax5 
19.2464  1.5504  1.3660  0.6594  0.2463 

Projected inertia (%):
    Ax1     Ax2     Ax3     Ax4     Ax5 
 80.193   6.460   5.692   2.747   1.026 

Cumulative projected inertia (%):
    Ax1   Ax1:2   Ax1:3   Ax1:4   Ax1:5 
  80.19   86.65   92.34   95.09   96.12 

(Only 5 dimensions (out of 24) are shown)

Il semblerait que l’ACP détecte dans un premier temps (avec le premier vecteur principal) la valeur moyenne d’expression, puis l’expression diff induite par le fer.

On peut faire des représentations dans le plan des seconds et troisième axes principaux (le premier traduit le fer, le second la carence en nitrate).

Sur le cercle de corrélations dans le plan 2 et 3, on voit bien que lorsque la carence fer est là, on peut différencier un effet nitrate. Le CO2 est, lui encore difficile à identifier, le 4eme axe de l’ACP ne renseignant pas beaucoup…

 

A work by Océane Cassan

oceane.cassan@supagro.fr